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SIGNIFICANCE  AND  EXPLANATION 


A  specific  robust  estimator  would  behave  well  if,  in 
the  real  world,  data  occurred  of  the  kind  which  favored  it 


In  this  paper  more  classical  sets  of  data  are  considered 
and  their  characteristics  are  studied  in  relation  to  proposed 


robust  estimators 


The  responsibility  for  the  wording  and  views  expre^ssed  in  this 
descriptive  summary  lies  with  MRC,  and  not  with  the  authors  of 
this  report. 


A  STUDY  OF  REAL  DATA 


Oina  Chen 
George  E.  P.  Box 

T.  Introduction. 

The  appropriate  procedure  to  adopt  in  data  analysis  depends 
heavily  on  how  the  data  are  supposed  to  be  distr-’buted.  Exact 
distributions  of  data  are  rarely  known  since  errors  are  usually 
contributed  to  by  many  different  factors.  The  central  limit 
theorem  has  often  been  appealed  to  by  statisticians  as  a  justifi¬ 
cation  for  the  assumption  of  normality.  However,  more  recently 
it  has  been  argued  that  heavier-tailed  distributions  and  distri¬ 
butions  containing  outliers  might  be  more  likely  than  the  normal 
distribution  to  describe  the  behavior  of  real  errors.  New  esti¬ 
mators  have,  therefore,  been  proposed  which  v/ould  be  expected 
to  behave  well  if  these  other  assumptions  were  true.  Studies  of 
the  behavior  of  the  proposed  estimators  have  usually  been  made  by 
simulation.  Typically,  some  feared  distribution  or  distributions 
were  chosen  from  which  pseudo  random  samples  have  been  drawn. 

The  estimators  have  then  been  computed  and  various  of  their  pro¬ 
perties  calculated  and  compared.  Most  frequently, efficiencies 

(or  asymptotic  variances)  were 

calculated  for  different  estimators  under  various  distribution  as¬ 
sumptions,  a  robust  estimator  would  be  one  having  a  high  efficiency 
(or  comparatively  low  asymptotic  variance)  over  all  the  tested 
distribution*:  The  difficulty  here  is  that  such  study  is 


[n  times  Fisher  information] 
variance  of  the  estimator 
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necessarily  rather  subjective  since  it  depends  on  what  distribu¬ 
tions  are  included. 

Stigler  (1977)  raised  the  question  "Do  robust  estimators 
work  with  real  data?"  He  applied  different  proposed  estimators 
to  several  sets  of  real  data,  and  measured  the  perfonmance  of 
each  estimator.  With  the  data  he  considered,  the  10«  trimmed  mean 
was  one  of  the  best  estimators  and  the  ordinary  sample  mean  came 
close  as  a  competitor.  Estimators  selected  from  the  best  modern 
contenders  did  not  come  out  very  well.  As  commented  by  Stigler 
"the  alternatives  to  an  independent  identically  normally  distri¬ 
buted  sample  that  have  been  considered  by  modern  workers  are  too 
restricted  or  too  exaggerated  to  reflect  accurately  'real'  data." 
Later  in  this  chapter,  therefore,  we  shall  examine  a  number  of  sets 
of  data  as  a  contribution  towards  finding  out  what  data  from  the 
real  world  is  like.  As  a  means  of  approximately  characterizing 
data  sets  and  relating  them  to  the  results  of  TSR  i?1997,  we  shall 
consider  again  the  contaminated  exponential  power  distribution 


=  (l-a)P(y|e,a,B)+aP{yle,ko,B) 


with 


r 


P(yl0,o,B)  =  w{B)o"^exp|-c(B) 
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and  a  >  0,  -to<o<»,  -1<B<1,  k>l. 
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This  provides  a  reasonably  broad  family  of  symmetric  distributions 
containing  both  light-tailed  and  heavy-tailed  members.  To  charac¬ 
terize  the  parent  distribution  for  each  set  of  data  v/e  shall  fit 
this  distribution  to  the  data  by  maximimum  likelihood.  There  are 
four  parameters  0,  a,  a,  6  altogether  (k  is  fixed  at  3).  In 
this  chapter,  however,  we  are  particularly  interested  in  tv/o 
parameters,  a  and  B,  which  characterize  the  shape  of  the  dis¬ 
tribution.  For  each  data  set,  therefore,  maximum  likelihood 
estimates  for  all  four  parameters  will  be  obtained  and  approxi¬ 
mate  confidence  regions  for  (3, a)  with  6, a  fixed  at  their 
maximum  likelihood  estimates  will  be  calculated. 

Analyzing  the  Real  Data  Sets 

In  this  study  of  searching  for  the  parent  distributions 
of  real  data  sets,  one  must  keep  in  mind  that  we  are  assuming  the 
data  set  is  a  random  sample  from  some  particular  parent  distri¬ 
bution  in  the  family  P  (ylG,o,S,a).  This  implies  that 

V* 

(i)  The  observations  are  independent. 

(ii)  The  observations  are  stationary,  i.e.,  have  a 
fixed  mean  and  variance. 

These  assumptions  must  be  checked, because  their  violation 


could  invalidate  the  estimation  procedure.  If  the  assumptions 
appear  to  be  satisfied,  one  can  proceed  as  follows: 


9 


1.  Maximum  likelihood  estimates  for  6,  o,  a  may  be 
obtained  by  maximizing  the  log  likelihood  function, 

n 

I  £og[{l“a)P(yJO,o,0)+aP(y  J0,3o,B)]. 

1=1  '  ' 

This  may  be  done  by  evaluating  this  log  likelihood  function 
over  a  suitable  grid  of  parameter  values  and  obtaining  the 

A  A  A  A 

maximum  (0,a,B,a)  numerically. 

2.  A  very  approximate  100(l-e)»  confidence  region  is  now 
given  by 

-2  I  {£og[(l-a)P(yj0,o,6)+aP(yj0,3o,B)] 

1»1  ^  ^ 

-iogC(l-qf)P{y^|e,a,e)+ctP(y^le,3a,8)]>  <  Xg(2) 

where  x^{2)  Is  the  lOOe  percent  point  of  the  chi- 

square  distribution  with  two  degrees  of  freedom  since 
we  have  fixed  two  parameters. 

We  will  thus  have  some  idea  about  what  positions  the  distributions 
of  real  data  might  take  in  the  (6, a)  space.  In  TSR*‘1997,  we 
mentioned  that  trade  offs  can  be  expected  between  a  and  p 
yielding  oblique  ridgy  confidence  regions  oriented  from  upper  left 
to  lower  right  in  the  B,o  plane.  It  turns  out  that  it  is  also 
possible  to  have  two  local  maxima  in  the  log  likelihood  function 
surface,  one  representing  a  highly  contaminated  short-tailed 
distribution  and  the  other  a  heavy-tailed  distribution  with  small 
or  no  contamination. 


r 

r 


3.  Daily  Changes  in  Price  of  CG"^"!on  Stock  (Cox  and  Jenkins) 


Box  and  Jenkins  (1970)  give  records  of  daily  IBM  cosision 
Stock  closing  prices  from  17th  May  1961  -  2nd  November  1962 
(Series  B)  and  29th  June  1959  -  30th  June  1960  (Series  B'). 

Series  B  has  369  observations  and  Series  B'  has  255  observations. 
It  has  been  suggested  (Bachelier,  1900)  that  the  first  differences 
of  stock  prices  behave  like  i.i.d,  random  sample  from  some 
distribution.  The  differenced  series  are  plotted  in  Figure  l. 

For  Series  B,  considering  first  the  whole  sample,  we  have, 

(0,0, B, a)  =  (0.0,  6.65,  0.60,  0.03) 

and  both  very  approximate  90%  and  95%  confidence  regions  for 
(B,o)  are  given  in  Figure  2(a). 

Wichern,  Miller  and  Hsu  (1976)  showed  that  the  variance 
became  unstable  in  the  latter  part  of  the  series.  They  suggested, 
however,  that  the  first  179  values  seem  to  be  a  homogeneous 
sample.  An  analysis  of  these  first  179  observations  yields 

A  A  A 

(0,0,B,a)  =  (.4,  5.3,  .25,  0.00)  . 


90%  and  95%  confidence  regions  for  {B,a)  are  shown  in  Figure 
2(b)  .  This  time  the  90%  confidence  region  actually  contains 
the  normal  distribution  (origin),  and  is  rather  elongate^ ij^ 

v  % 


the  direction  expected. 
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Figure  1  .  First  Differences  for  Series  B  and  Series 


BETA 

Figure  2(a).  Approximate'  confidence  regions  for  (6, 
(First  differences  of  Series  B) 


Approximate  confidence  regions  for  (B,a) 

(First  differences  of  Series  B,  first  179  values) 


The  example  shows  that  for  these  data,  tail  heaviness  of 
the  whole  sample  arises  because  of  changes  in  variance  in  the 
latter  part  of  the  series.  It  seems  that  for  the  first  179  obser¬ 
vations  the  parent  distribution  could  very  well  be  represented 
by  a  normal,  or  slightly  contaminated,  normal  distribution.  As 
for  Series  B',  maximum  likelihood  estimation  was  carried  out 
for  the  first  differences  of  the  series  to  obtain 

A  A  ^  ^ 

(6,0, B, a)  =  (0.0,  3.54,  0.20,  0.16)  and  the  approximate  con¬ 

fidence  regions  shown  in  Figure  2(c).  The  confidence 
regions  are  elongated  as  expected  showing  the  trade  off  betv/een 
0  and  a.  A  contaminated  normal  distribution  with  a  ~  .25 
could  well  represent  the  data. 

4.  Measures  of  Velocity  of  Light 

Measurement  of  the  Velocity  of  Light  in  a  Partial  Vacumm  by 
Michel  son,  Pease  and  Pearson  (Michelson,  Pease  and  Pearson  (1935)) 

During  the  period  September  1929  to  March  1933,  the 
velocity  of  light  was  measured  at  the  Irvine  Ranch  near  Santa  Ana, 
California  by  Michelson,  Pease  and  Pearson.  The  observations 
were  made  by  the  rotating-mirror  method.  In  all  233  series  were 
recorded,  the  averages  for  which  are  plotted  in  Figure  3. 

Every  series  represents  two  to  eight  sets  of  observations 

taken  during  over  about  an  hour's  time  with  each  set  of  obser¬ 
vations  usually  furnishing  three  values  of  velocity  of  light. 
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(First  differences  of  Series  3') 


All  these*  observations  v/cre  actually  made  between  February  1931 
and  March  1933.  In  1931,  observations  were  not  made  continuously 
over  the  year;  instead,  a  group  of  observations  were  made  in 
February,  another  group  in  late  March  and  early  April  and  so  on. 

The  means  and  variances  of  these  groups  fluctuate  considerably. 

The  155  observations  made  after  April  1932  were,  hoivever,  much 
more  homogeneous.  Behavior  of  this  kind  is  common  where  a  new 
method  of  scientific  measurement  takes  some  time  to  "debug"  and 
to  settle  down  to  produce  homogeneous  results. 

We  shall,  therefore,  again  perform  tv;o  analyses— the  first 
for  all  the  data  and  the  second  for  the  last  155  observations. 

(i)  For  the  whole  data  set 

(e,o,B,a)  =  (774.0,  14.6,  0.0) 

Approximate  confidence  regions  for  (3, a)  are  shown  in 
Figure  4(a). 

(ii)  For  the  155  observations  between  April  1932  and  February  1933 

(O.o.B.a)  =  (773.1,  9.47,  -.35,  0.10) 

Approximate  confidence  regions  for  (3, a)  are  drawn  in 
Figure  4(b). 

A  A 

Again,  notice  that  the  maximum  likelihood  estimates  (3,ci)  move  from 
(.80,  0.0)  to  (-.35,  0.10)  as  we  leave  out  the  inhomogeneous  part 
of  the  data,  and  the  confidence  regions, which  originally  cover 
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Figure  4(a)  .  Confidence  regions  for  (3, a) 

(l-ieesurerrents  of  the  velocity  of  light  by 
i'.ichelson.  Pease,  Pearson) 
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Figure  4(b).  Confidence  regions  for  (3, a) 

(155  measurements  of  the  velocity  of  light  made 
between  April  1932  and  February  1933  by  Michel  son. 
Pease  and  Pearson) 


the  lov/cr  right  corner,  nov/  cover  the  central  part  and  sonio  upper 
left  part  of  the  plane.  Also  note  that  v/hile.for  the  homogeneous 
part  of  the  data,  the  90%  confidence  region  actually  contains  the 
normal  distribution  (0.0,  0.0)  and  the  maximum  likelihood  esti¬ 
mate  implies  a  contaminated  light-tailed  distribution,  when 
considering  the  whole  data  set,  the  maximum  likelihood  estimate 
represents  a  heavy-tailed  distribution  somewhat  close  to  the 
double  exponential  and  the  95%  confidence  region  excludes  the 
normal  distribution. 

The  analysis  strongly  suggests, as  with  the  IBM  stock 
prices,  that  the  heavy  tails  are  caused  by  secular  inhomogeneity. 


Measures  of  the  velocity  of  light  by  Newcomb  and  Michel  son  in 
1880  and  1831 


A  series  of  measurements  of  the  velocity  of  light  were 
made  early  in  1880  and  1881  by  Newcomb  and  Michel  son  (Nev/comb, 
1891).  Their  method  consisted  in  essence  of  reflecting  light  off 
a  rapidly  rotating  mirror  to  a  distant  fixed  mirror  and  back  to 
the  rotating  mirror.  The  velocity  of  the  light  is  then  determined 
by  measuring  the  distance  involved,  the  speed  of  the  rotating 
mirror  and  the  angular  displacement  of  the  received  image  from 
its  source.  The  data  of  interest  recorded  here  are  the  time  of 
passage  in  millionths  of  second,  from  which  the  velocity  can  be 
calculated.  To  simplify  computation,  the  data  values  we  actually 


t 

I 


considered  are  (recorded  time  of  passage  -  17,000).  The  observa¬ 
tions  made  on  each  day  are  plotted  in  Figure  5.  We  can  see  from 
the  plot  that  there  are  appreciable  day  to  day  variations.  This 
was  to  be  expected  because  of  factors  such  as  weather  conditions. 
Some  of  the  factors  which  might  have  affected  the  results  were 
recorded.  For  example,  it  was  noticed  that  it  was  cloudy  on 
August  25,  and  that  images  were  faint  for  some  observations  so 
that  on  that  day  we  might  expect  a  larger  variance.  Again  on 
September  15,  the  instrument  was  adjusted,  and  this  might  be 
expected  to  influence  both  the  mean  and  variance  on  that  day.  With 
these  in  mind,  any  assumption  of  constant  mean  and  variance  is 
dubious.  One  way  to  get  rid  of  day  to  day  variation  is  to 
standardize  the  observations  within  each  day,  i.e.,  to  calculate 
the  mean  and  standard  deviation  for  data  on  each  day,  and  use 


observation  -  mean 
standard  deviation 


as  new  observations. 


What  we  have  done  with  this  set  of  data  is,  first,  analyze 
the  whole  set  of  the  original  data;  second,  leave  out  the  three 
obvious  outliers  as  shown  by  an  arrow  in  Figure  5  ;  and  finally, 
analyze  the  data  set  after  standardizing  within  each  day.  Note  that  in 
standardizing  the  data,  the  three  obvious  outliers  were  discarded, 
and  when  only  a  single  observation  was  made  in  a  day,  this  obsor- 
varion  was  also  excluded  since  no  standard  deviation  was  available 
for  it.  The  results  are 


Figure  5.  Measurements  of  the  velocity  of  light  by  Newcomb  and  Michelson  in  1880  and  1831. 


(i)  If  v.'c  take  the  whole  set  of  data  as  a  random  sample 
(n  =  150),  then 

(C,o»P,a)  =  (28.0,  4.16,  1.0,  .11 ) 

Approximate  confidence  regions  for  (B,cx)  are  given  in 
Figure  6(d)  ,  which  show  that  the  parent  distribution 
have  very  heavy  tail  (with  both  large  8  and  a  values) 
if  the  data  set  is  really  a  random  sample. 

(ii)  If  we  throw  out  three  obvious  wild  observations  and 
regard  the  rest  as  a  random  sample  (n  =  147),  then 

(6,0, 8,a)  =  (28.0,  4.3,  1^,  0.0) 

and  the  approximate  confidence  regions  for  (6, a)  are 
given  in  Figure  6(b).  These  confidence  regions  have 
quite  different  shapes  from  the  others,  the  90"  confidence 
region  has  two  parts, located  at  the  lower  right  and  upper 
left  plane,  respectively.  This  came  as  no  surprise 
because  we  know  there  is  compensation  between  8  and  a. 

A  distribution  with  heavy  tail  and  small  contamination 
could  be  soniewhat  similar  to  a  distribution  with  light  or 
moderate  tail  but  high  contamination,  and  for  this  parti¬ 
cular  set  of  data,  both  areas  in  the  90"  confidence  region 
contain  distributions  which  can  adequately  describe  the 
data. 
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(iii)  l.-e  standardise  the  observations  node  on  cacli  duy. 


Suppose  are  observations  wade  on  a  particular 

?  (yi-y) 

day,  y  and  =  >  — — —  arc,  respectively,  the  sair.plc 

i=l 

y.-y 

mean  and  the  sa::i?le  variance.  If  v.'e  take  r^  =  — —  as 
a  new  observation,  then  cc:::bining  the  standardized 
observations  on  each  day,  we  get  a  new  set  of  data  which 
is  honogeneous  in  both  mean  and  variance.  In  fact,  since 
Ir^  =  0  Zr?  =  n-1 ,  which  implies  E(}:r^)  =  0  and 

E(Zrp  =  n-1,  the  mean  and  variance  can  be  obtained  as 

E(r^)  =  0  and  Var(r.)  =  E(rp  =  ^  . 

With  this  ne\;  set  of  data. 


(e,o,P,a)  =  (-.05,  .94,  -.65,  0.0) 

and  approximate  confidence  regions  for  (p;,a)  are  shown 
in  Figure  3.6(c).  In  considering  this  calculation  it  must 
be  remembered  that  breaking  the  data  into  pieces  and  stan¬ 
dardizing  within  each  piece  will  greatly  distort  the 
distribution  shape.  If  the  parent  distribution  for  y^ 

is  normal  then,  from  syn'.motry,  the  coefficient  of  skewness 
for  r^  will  still  bo  zero,  hut  the  coefficient  of 
kurtosis  (X^)  for  r.  con  deviate  markedly  from  its 

normal  value  (X,  =  0). 
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Approximate  confidence  regions  for  (S.ci) 
(Newcomb  and  Michel  son  data  after 
standardization) 


Since  is  independent  of  a  and  s  is  a  sufficient 
statistic  for  o,  is  independent  of  s.  It  follows  that  the 
coefficient  of  kurtosis  can  be  obtained  as  below: 


E(yj-y)'  •  E((^)'-s')  =  E(r;  •  s‘)  =  E(rpn(s*) 

E(y  -y)‘ 

E('-i)  =  --  -- 
’  E(s') 

From  the  distributions  of  y^.-y  and  s,  v/e  then  have 

E{r\)  = 

^  n  (n+1 ) 

and  X,  =>  _Ei!lL  .  3  =  ^2^  -  3. 

^  Var(r,.)= 

The  coefficient  of  kurtosis  for  r^.  is  always  less  than 
0,  indicating  a  flat  tailed  distribution  for  r^.  As  n  gets 

larger,  the  coefficient  of  kurtosis  gets  closer  to  0  —  the  value 
for  a  normal  distribution.  In  this  case  (on  the  average  about 
six  observations  were  made  in  a  day), taking  n  =  6,  we  obtain 


E(r.)' 


-  3  =  -.858.  Lund  (1967)  calculated  coefficients  of 


Var(r^)* 

kurtosis  for  the  exponential  power  distributions.  As  shown  in 


Table  3.1,  an  exponential  power  distribution  with  8  =  -.50 
will  produce  equal  to  -.81.  After  some  calculation 
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B  -1.0  -  .75  -.50  -.25  0.0  .25  .50  .751.00 

X.  -1.20  -1.07  -.81  -.45  0.00  .55  1.22  2.03  3.00 

4 

Table  1 

one  can  also  show  that  for  3  =  -.55,  X^  =  -.87.  Therefore,  we 

concluded  that  if  the  parent  distribution  for  is 

normal,  one  would  expect  that  the  distribution  for  r^  (when  n 

is  approximately  6)  would  have  taiT  behavior  similar  to  that  of  an 
exponential  power  distribution  with  3  =  -.55  which  is  very  close 
to  the  maximum  likelihood  estimate  of  -.65.  Thus  it  appears 
that  the  normal  distribution  alone,  or  maybe  with  some  slight 
contamination, could  adequately  describe  the  data  if  there  had 
not  been  daily  changes  in  levels  and  variances. 

5.  Some  Data  Sets  Investigated  by  Tocher  (1928) 

In  1928,  Tocher  published  the  paper  "An  Investigation 
of  the  Milk  Yield  of  Dairy  Cows,"  in  which  a  large  amount  of  data 
were  given.  These  are  the  records  about  milk  yields  of  dairy 
cows  supplied  by  "The  Scottish  Milk  Records  Association."  Var¬ 
ious  "characters"  are  investigated  for  each  cow,  e.g.,  estimated 
total  quantity  of  milk  in  gallons  for  the  entire  lactation 
period,  total  quantity  of  butter  fat,  age  of  the  cow,  ....  etc. 
Tocher's  investigation  had  several  purposes.  In  one  interesting 
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analysis  he  fitted  Pearson  curves  to  a  number  of  large  data  sets. 
Three  of  these  data  sets  v;hich  we  will  consider  in  our  study  are 

(1)  Total  yield  of  butter  fat  in  pounds  for  449  cows  at  age  7. 

(2)  Total  yield  of  butter  fat  in  pounds  for  251  cows  at  age  9. 

(3)  Quantity  of  milk  in  gallons  for  240  cows  with  percentages 

of  butter  fat  betv/een  3.00  and  3.25. 

The  exact  data  values  are  not  available.  Tocher's  paper  gives 
frequency  distributions  for  the  data,  with  a  grouping  interval  of 
50  gallons  for  quantity  of  milk  and  20  pounds  for  yield  of  butter 
fat.  The  frequency  distributions  are  shown  in  Figure  7. 

Foil  owing  the  procedure  stated  before,  we  have 

(i)  For  total  yield  of  butter  fat  for  cows  at  age  7 

(0,0, 3,a)  =  (249.75,  52.15,  -.20,  .01  ) 

Approximate  confidence  regions  are  shown  in  Figure  8(a). 

(ii)  For  total  yield  of  butter  fat  for  cows  at  age  9 

(0.a,8,a)  =  (265.8,  60,15,  0.35,  0.0) 

Approximate  confidence  regions  are  given  in  Figure  8(b). 

(iii)  For  quantity  of  mill;  in  gallons  for  cows  with  percentages  of 
butter  fat  between  3.00  and  3.25. 

(e,o.8,a)  =  (664.0,  176.0,  0.25,  0.0) 

Approximate  confidence  regions  are  given  in  Figure  8(c). 
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figure  7  Frequency  distril)utions  for  Tocher's  data. 
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Figure  8(a)  Approximate  confidence  regions  for  (S,a) 
(Yield  of  butter  fat  for  cov;s  at  age  7) 


Figure  8(b)  Approximate  confidence  regions  for  (6, a) 
(Yield  of  butter  fat  for  cov;s  at  age  9) 
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Figure  8(c)  Approximate  confidence  regions  for  {B,a) 


An  the  confidence  regions  in  Figure  8  are  reasonably 
close  to  the  center  of  the  plane,  suggesting  that  the  data  sets 
could  be  random  samples  from  distributions  not  too  far  away  from 
the  normal.  It  is  interesting  to  notice  that  tiie  analysis  sug¬ 
gests  that  the  distribution  of  the  yields  of  the  butter  fat  for 
cows  at  age  7  is  a  short-tailed  distribution.  This  runs  contrary 
to  the  common  belief  that  most  data  sets  have  heavy-tailed  distri¬ 
butions.  Cox  (Discussion  of  Stigler's  paper,  1977),  however,  has 
said  that  according  to  his  experience,  short-tailed  distributions 
could  arise  as  well  as  heavy-tailed  distributions. 

These  three  sets  of  data  are  different  in  the  nature  from 
those  discussed  before,  all  three  sets  of  data  being  measurements 
made  on  members  of  a  particular  population  at  approximately  the 
same  time,  while  the  sets  discussed  previously  are  measurements 
or  observations  made  sequentially  over  a  long  period  of  time.  The 
latter  type  suffers  from  the  effect  of  secular  inhomogeneity  but 
the  former  does  not.  This  can  partly  explain  why  Tocher's  first 
three  sets  of  data  are  quite  close  to  a  normal  distribution. 

Due  to  the  high  cost  involved  in  collecting  the  data,  the 
Scottish  Milk  Record  Association  was  not  able  to  make  daily 
observations  of  milk  for  each  cow.  The  morning  and  evening  milk 
of  each  cow  was  tested  at  intervals  ranging  from  14  to  28  days. 

The  operation  was  repeated  after  an  interval  of  about  fourteen  days 
The  quantity  of  milk  and  the  percentage  of  butter  fat  yielded 
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by  each  cow  in  the  interval  were  assumed  to  bo  those  given  on  the 
date  of  the  previous  test.  So  the  yield  of  milk  determined  by 
the  Scottish  Milk  Record  Association  is  not  the  actual  yield  but 
an  estimated  one.  During  the  period  of  1911-13,  some  daily 
records  were  available  from  private  sources.  Tocher  proposed  to 
compare  these  actual  yields  with  those  estimated  by  the  Association 
to  see  if  the  estimate  is  any  good.  The  data  are  available  on  107 
cows,  and  we  will  take  the  difference  of  yields  of  milk  (Associa¬ 
tion  -  Private)  as  observations  and  examine  the  distributional 
property  of  these  107  observations.  A  plot  of  these  data  is  given 

A  A  A  A 

In  Figure  9  .  (e,o,3,a)  for  these  107  data  are  (.10,  11.09, 

.20,  .11),  approximate  confidence  regions  are  shown  in  Figure  lo. 
The  confidence  region  is  very  large  because  the  sample  size  is 
relatively  small,  and  it  is  elongated, as  expected. 

6.  Wiebe’s  Agricultural  Data 

Wiebe  (1935)  gives  the  grain  yield  of  1,500  rows  of 
Federation  wheat  in  a  uniformity  trial.  He  studied  the  variation 
and  correlation  in  grain  yield  among  the  1,500  wheat  nursery 
plots  to  seek  for  criteria  for  better  seeding  arrangement  ir 
future  field  trials.  Barbacki  and  Fisher  (1936)  used  his  data  to 
examine  the  relative  precision  of  systematic  and  randomized 
experimental  arrangements.  In  field  trials  of  this  kind  it  is 
well  known  that  results  from  plots  close  together  are  inevitably 


-32- 


Difference  of  yield  of  milk  In  gallons 


and  Private  records 


Figure  10  Approximate' confidence  regions  for  (B,a) 
(Difference  of  yields  of  milk  in  gallons 
between  Association  and  private  records) 


correlated,  and  blockirig  and  rando"i?aticn  v;ore  introduced  to 
ensure  proper  isolation  of  the  real  effects  of  treatments  in  these 
circumstances. 

This  1  ,500  rows  of  v-’hoat  were  grov/n  in  the  summer  of  1927 
on  the  Aberdeen  Substation,  Aberdeen,  Idaho.  The  1,500  rows  were 
grown  in  12  series,  each  having  125  rows.  Figure  ^  shows  the 
general  arrangement  of  the  crop.  The  rows  were  12  inches  apart 
and  15  feet  long.  The  plot  was  seeded  on  April  18  with  a  grain 
drill  that  sov/ed  eight  rows  at  a  time.  The  individual  rows  v/ere 
harvested  in  August  and  threshed  with  a  small  nursery  thresher. 

The  grain  yields,  recorded  in  grams  per  row  are  shown  in 
Table  .  A  contour  map  of  the  yield  obtained  in  the  plot  given 
by  Wiebe  is  shown  in  Figure  12  (The  contour  map  is  not  obvious 
from  the  data;  Wiebe  must  have  used  some  kind  of  smoothing  process 
which  is  not  given  in  his  paper).  Sequence  plots  of  all  tv/elve 
series  are  shown  in  Figure  13.  If  we  ignore  the  obvious  serial 
effects  and  treat  these  series  as  random  samples  drawn  from  some 
distribution,  then  for  any  data  set  we  can  estimate  6,  a,  8  and 
o  using  the  prodecure  described  on  page  4.  Thus  for  series  1,  we 

A  A  ^  A 

find  (0,0,8, a)  =  (651.6,  79.7,  0.45,  0.0)  with  the  approximate 
confidence  regions  shown  in  Figure  14  .  Although  the  maximum 
likelihood  estimates  suggest  an  approximate  uncontaminated  some¬ 
what  heavy-tailed  parent  distribution,  the  confidence  regions 
for  ((',«)  still  include  the  normal  distribution. 
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SERIES  1 


•  SERIES  2 


•SERIES  3 


Figure  13  Sequence  plot  of  Wicbo's  12  scries 
given  in  Table  3.2. 
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SERIE?  4 


SERIES  5 


SERIES  6 


SEOUCKCe  K3. 

Figure  13  continued 
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SERIES  7 


SERIES  8 


SERIES  9 


Figure  13  continued 
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SERIES  10 


SERIES,  n 


SERIES  12 


SEOUEKCE  K3.. 

Figure  13  continued 
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te  confidence  regions  for  (0,a) 
agricultural  data,  series  1) 


More  COT, plicated  models  v;hich  take  into  account  the 


serial  correlations  were  also  fitted  to  this  set  of  data.  We 
discuss  this  now. 

The  Time  Series  Model 

An  autoregressive  integrated  moving  average  model  (ARIfV\) 
was  developed  following  the  analysis  proposed  by  Box  and  Jenkins 
(1976).  The  sample  autocorrelation  function  (ACF)  and  partial 
autocorrelation  function  (PACF)  are  shown  in  Figure  15  .  They 

have  more  or  less  similar  patterns  which  indicate, as  might  be 
expected,  that  all  12  series  could  be  represented  by  the  same 
model.  Large  values  of  ACF  and  PACF  are  seen  at  lag  8,  16 
suggesting  the  presence  of  a  seasonal  effect  with  period  8.  This 
effect  almost  certainly  resulted  because  the  plots  were  seeded 
with  a  grain  drill  which  sowed  eight  rows  at  a  time.  After  some 
experimentation  it  was  found  that  a  satisfactory  model  was 

The  estimation  and  diagnostic  checking  results  are  displayed  in 

Table  3. 

Estimating  0,  a,  8  and  a  as  before  from  the  residuals, 

we  find 

(6,o,8»a)  **  (-1.1»  56.45,  .25,  0.0) 

with  90^  and  95%  confidence  regions  for  (8. a)  plotted  in 
Figure  3.16.  The  confidence  regions  are  seen  to  be  similar  to 
those  found  for  z^. 
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Figure  15(a)  ACF  for  Wiebe's  12  series  given  in  Table  2. 
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Figure  15(a)  continued 
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Figure  15(h)  continued 


-51- 


CIO 


Series  10 


ti 
•  4 

'  I 


Series  11 


•  ^ 

r« 


Scries  12 


€4 


Figure 


15(10 


K 

K 

K 


H 


>  X  K 
K  K  A  K 


xkkkKkxx-k 

K  K  »•  >. 

M  K  X 


.VX'V  >>.MXVKV  V 


X  K 


r»  ^  n  X  c  n  r<  ^  ^  ^  Cl  o  x  »-»  k  m  t-y  o  n 

o  ^  r»  r)  f»  ^  c»  t>  c)  •«<*•••  V  .  -o  * »  «  v  •  •  f  >  i •  %  •«  ••  •. 

ri  4  o  •«  •«  t*  ».*  •«  o  o  s.»  v»  ^  c*  ».>  -*  Kj  ^  ».»  V- 


I  I 


r><'xr'4’  o 


I 


:i 

•| 


X 
H 
K 
K 
X 

K  K 
K  K 
K  X 
X  K  X 
K  K  X 


K 

K 


;  V  K 


V  X 
X  X 


K  X 
V  X 


.-X  K 

XXX  XXX  X  XX  X 

,X  X  X  X  V  X  V  .X  X  X  X  /V  X  V  X  V  X  X  x  H  K  >  >:  X  X  X 

XK  X  XX  XX  XX  K 

K  X  K 

K 


K 

H 


K 

K 


K 

X 


K  «r  X  n  o  ^  n  o  —  X  X  c;  -.•  <  .t  *•  ->  fi  •'  •)  ••  v  s:  *•«  v 

^  X  r»  o  o  •«  f «  •«  o  o  o  o  C'  o  o  o  t*  o  c*  -•  o  1'  c»  o  i.‘  o 


««MMv4M»««i«v^«-4»4(4r4r4(«C4r4i4r<ir«i><n 


5; 

X 

s 


K 

K 

K 

X 

X 

V 

X  M  K 
K 


K  X  y 
X  X  V  K 
X  X  X  X 

V  X  M  K  y  V 

K  K 

K  H 


R 

K 

X 


^XXXXHVXXWXKKV 
X  X  K  X  XX  w 


V 

y  V 

V  X  K  X  K 


K 

K 


K 

X 


K  X 
>N  X 


t*  O  f4  l>  1'  N  »>  f»  '  ( '  I  >  i  »  V  f  I  ^  r«  »'  M  r .  :  1  .« 

^  O  »«  v*  t>  1  4  ••  •-•  C  ••  V'  •■•  i'  ••  O  *.»  O  V'  •*  "  O  ••  U  1'  o 


Mf^|<>^|r><KCD4'0»4Mf)4  n^xn^  %%oK  (nx 

««*4««*«»^*4»«*«»4*4l4r4l>(C4X  t*^llC4  (4  IN  I 


continued 


.S2- 


i 


•  9ETR 

Figure  16.  Aoproximate  confidence  regions  for  (B,a) 

{'..'iebe's  data:  residuals  from  time  series  model) 


In  general,  the  relationship  between  the  kurtosis  for 

and  is  readily  established  for  a  stationary  process 
<.(B)z^  =  0(B)a^. 

Consider  first  a  moving  average  process 


2^  =  (l-O^B-e2B^-...-0qB'’)a^  0.'s  not  all  zero 

“  V°i®t-rVt-2‘-’-'Vt-q ' 

If  independently  and  identically  distributed 

with  cumulants  k.(a^)  and  coefficient  of  kurtosis  then 

the  i-th  cumulant  for  z^  will  be 

k^(a^)+eik.(aJ+e^k.(a^.)+...+0jk.(a^)  =  (l+e]+...+0q)k.(a^), 

The  coefficient  of  kurtosis  is  then 

k|(zp  (l+ep...+0')“k|(ap  (U8(+...+e=)" ''  ^ 


It  is  easy  to  see  since 


I+0J+...+OJ 

(l+0|+...+0n2 


<  1  ,  that 


is  more  normal  than  a^  and  in  particular,  z^  is  normally 
distributed  provided  that  a^  is  so  distributed. 

Any  stationary  model  <{»(B)z^  =  0{B)a^  can  be  written  in 


the  form 
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*  I  •  where  I  converges. 

^  j=0  ^  j=0 


It  then  follows  that  in  general 


(j/l) 


-  x^Ca^) 


iikiL., 

o?r 


The  previous  conclusions  for  the  moving 


average,  therefore,  hold  for  any  stationary  series. 

What  we  have  is  evidently  a  central  limit  effect  which  makes 
2^  more  normal  than  a^.  In  the  important  special  case  of  a  first 


order  autoregressive  process 


»  a^ 


kl  <1 


(l+(j)B+(j.*6*+(J>^B*+...)a^ 


and  = 


(.iAi')' 


The  larger  ^  is,  the  smaller  will  be  and  the  stronger 

will  be  the  central  limit  effect. 


The  situation  is  different  for  a  nonstationary  process. 
.This  may  be  demonstrated  by  a  simulation  study  using  a 


random  walk  model . 

Generate  n  random  normal  deviates  a^,...,a^. 
t 

Calculate  ~  I  a.. 

I  1=1  ^ 

n  I  (z.-z)- 

1=1  ' 

Calculate  — - r*  -  3  v/hich  is  an  estimate  of  the 

( 

'l=l  ^ 

coefficient  of  kurtosis 

With  n  =  100,  five  such  simulations  gave  estimates  of 
of  -.44,  r.61,  -.56,  -.28,  -.61.  With  n  =  200, 
three  simulations  gave  estimates  of  of  -1.14,  -.95 
and  -2.0. 

For  Wiebe's  data,  nonstationary  models  were  tried  but  did 
not  fit  as  well  as  the  stationary  model.  The  observed  result  for 
this  data  that  the  tail  behavior  for  and  a^  is  similar  will 

be  expected  when  a^  is  nearly  normally  distributed  and  a  sta¬ 
tionary  model  is  appropriate. 

An  Alternative  Approach:  the  Block  Model 

Traditionally,  the  kind  of  design  that  has  been  used  for 
agricultural  data  employs  small  blocks.  Since  the  drill  machine 


(i) 

(ii) 

(iii) 
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sowed  eight  rows  at  a  time,  every  eight  rows  can  conveniently  be 

viewed  as  a  block  and  associated  with  a  block  effect  B-.  A  treat- 

J 

ment  effect  D.  could  be  associated  with  the  position  of  the  drill 
A  plausible  model  is  then 


t  =  1 .  ,125,  1  1  i|.  1  8, 
1  <  j^<  16 


't  '  ^mod  8  =  8 

'  [(t-l)/8]  +  1 

and  ZD.  =0  ZB.  =  0. 

\  Jt 

The  "design"  thus  produced  is  systematic  and  not  randomized.  How¬ 
ever,  we  might  allow  for  serial  correlation  in  the  errors  by 
letting  be  an  autoregressive  process 


(l-4>B)c^  =  where  a^'s  are  i.i.d.  N(0,o*). 


The  model  now  becomes 

2.  =  y+D.  +B.  +E^  with 
i  \  Jt  ^ 


(l-(})B)c^  =  a^ 

a^  i.i.d.  N(0,a2) 


Model  (1) 


The  parameters  can  now  be  estimated  using  nonlinear  least 

squares. 

Alternatively  wc  can  apply  linear  least  squares  estimation 
In  the  model 

*  [x]  means  tlie  intovger  part  of  x. 
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Y|.  =  +0.  +a.  where 

t  t  •’t 


Yt  =  V^-r 


V.  =  D.  -<.D.  .  B.  =  B.  -<^B, 

t  ’t  t-1  '^t 


't-1 


for  various  <>  values,  and  plot  the  residual  mean  square  against 

Ak 

<).  The  estimate  of  <{)  is  obtained  at  the  minimum  of  the  curve 
and  all  the  other  parameters  are  estimated  by  linear  least  squares 

A 

with  this  <J)  value. 

Figure  17  shows  this  estimation  procedure  for  Series  1 

A 

which  yields  (!>  =  0.26.  The  estimates  of  all  the  other  parameters 

are  given  in  Table  4  .  The  ACF  and  PACF  for  the  residuals  are 

given  in  Figure  18  .  No  significant  pattern  is  noticeable. 

If  we  now  look  back  to  the  time  series  models  we  have 
fitted,  we  see  that  even  the  best  fitted  model  for  Series  1  has  a 
residual  mean  square  of  over  2600  which  is  considerably  larger  than 
the  residual  mean  square  for  the  present  model  (2078).  Even  if  we 

did  not  allow  for  serial  correlation  for  the  errors  and  made  the 

assumption  that  the  errors  were  i.i.d.  N(0,o^),  i.e.,  <|>  =  0, 
the  residual  mean  square  2216  is  still  relatively  small  although 
ACF  and  PACF  for  the  residuals  (Figure  IS  )  do  now  show  some 
pattern.  It  is  clear,  however,  that  the  block  model  does  very 
v;ell  in  explaining  the  data. 
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Residual  Mean  Square 


Figure  17  Residual  mean  square  curve  with  the  block  mode 
for  Series  1. 


Table  4 


Estimates  of  the  parameters  in  model  (1) 


parameter 

estimated  value 

parameter 

estimated  value 

♦ 

.26 

.3 

V 

643.63 

»5 

-  4.4 
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Figure  19  ACF  and  PACF  for  residuals  obtained  from  fitting 
Series  1  to  model  (1). 


ata:  Residuals  obta 
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If  after  fittiruj  the  block  model  v;e  apply  our  analysis  to 
the  residuals,  we  obtain 

(O.o.P.u)  =  (-.023,  1.0,  .25,  0.0) 

and  approximate  confidence  regions  for  (n,ct)  are  shown  in  Figure 

^0. 

Again  comparison  with  Figure  M  shows  tliat  the  confidence 
region  is  remarkably  little  changed. 

7.  Conclusions 

From  this  study  of  real  data  sets,  we  conclude  the 
foil  owing: 

1)  Very  heavy-tailed  distributions  do  occur  in  practice. 
However,  they  are  often  caused  by  inhomogeneity  in  levels 
and  variances.  Serial  correlation  in  a  stationary  process 
makes  the  data  more  noniial  than  the  generating  shocks. 
However,  serial  correlation  in  a  finite  sequence  from  a  non¬ 
stationary  process  generated  from  normal  errors  produces 

an  apparent  light-tailed  distribution  for  z^. 

2)  Data  collected  over  a  long  period  of  time  usually  suffer 
from  secular  inhomogeneity  and  would,  therefore,  not  be 
expected  to  be  normally  distributed.  However,  much  of  the 
Inhomogeneity  of  this  kind  would  be  eliminated  or  reduced 
when  appropi'iate  designs  and  blocking  wore  adopted,  and 
appropriate  input  variables  were  included  in  the  model. 


3)  For  the  data  we  have  studied  which  was  reasonably 
homogeneous,  the  distributions  arc  most  frcciucntly  slightly 
heavy-tailed  and/or  contaminated.  However,  light-tailed 
and  contaminated  light-tailed  distributions  seem  also  to 

occur. 

4)  After  obvious  inhomogencitics  in  a  series  (for  example,  in 
most  cases, due  to  start-up  phenomena)  have  been  allowed  for, 
the  trade  off  between  a  and  0  causes  the  confidence  regions 
to  be  elongated  usually  and  to  include  suitably  contami¬ 
nated  normal  distributions.  We  believe,  therefore,  that 

most  data  sets  of  this  kind  could  be  adequately  described 
by  contaminated  normal  distributions  with  suitably  chosen 
a  and  k. 
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